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ABSTRACT 
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quasi-loglinear formulation of the Rasch model is given by which one 
item at a time can be deleted from or added to an initial item set. 
In the so-called "top-down'' algorithm, items are stepwise deleted 
from a relatively large initial item set, whereas in the "bottom-up" 
algorithm items are stepwise added to a relatively small initial item 
set- Both algorithms are evaluated through a simulation study with 
generated data. Item parameters are given for four generated 
unidimensional data sets and two generated two-dimensional sets. 
Abilities were randomly sampled from a multivariate normal 
distribution with a sample size of 1,000. Results for the top-down 
algorithm were poor, but results for tht"? bottom-up algorithm were 
more encouraging. It is suggested that alternating the bottom-up 
algorithm with one or two iterations of the top-down algorithm would 
allow the procedure to reject items that were added incorrectly in a 
previous step. Eight tables illustrate the item parameters and the 
use of both algorithms for the generated data. (SLD) 
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Abstract 

Two iterative procedures for constructing Rasch scales 
are presented. A log-likelihood ratio test based upon a 
quasi-loglinear formulation of the Rasch model is given by 
which one item at a time can be deleted from or added to an 
initial item set. In the so— called top-down algorithm, items 
are stepwise deleted from a relatively large initial item set 
whereas in the Lottom-up algorithm items are stepwi^^i added 
to a relatively small initial item set. Both algorithms are 
evaluated by means of generated data. The results for the 
top-down algorithm are bad whereas the results for the 
bottom-up algorithm are more encouraging. 

Key words Item selection. Log-likelihood ratio test. Quasi- 
loglinear models. Rasch model. 
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Stepwise Item Selectiion Procedures for Rasch 
Scales Using Quasi-loglinear Models 

When constructing Rasch (1960) scales from a large set 
of items, it often happens that the Rasch model does not fit 
to the entire item set. This lack of fit is due to the rather 
strong assumptions of the Rasch model (cf. Molenaar. 1983), 
e.g., unidimensionality of the underlying ability and local 
stochastical independence of item scores. Therefore, usually 
a two-step procedure is recommended for constructing Rasch 
scales . 

The first step involves the identification of one or 
more subsets of items approximately satisfying the Rasch 
model. This identification can, for instance, be basod upon a 
multidimensional representation of the items (cf . Knol, 1986, 
1987a) by dividing the space in subspaces. 

The second step consists of iteratively deleting one 
item at a time from a relatively large initial subset or by 
adding one item at a time to a relatively small initial 
subset. Usually, deletion of items is based upon item 
statistics incorporated in computer programs for the Rasch 
model. For example, the program PML (Molenaar. 1981) gives 
biserial correlations, U^ statistics (Molenaar, 1983) and 
contributions of the items to overall goodness of fit tests. 
However, decisions based upon these indices are highly 
subjective, partly because of the sometimes contradictory 
information they provide. 
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Especially for large scale applications, there is a need 
for automatic procedures . Such procedures should preferably 
be based upon sound statistical tests . Some efforts in this 
direction have already been made. Verhelst (1983) proposed a 
stepwise procedure based upon a log-likelihood ratio test. 
However, it seems that his test is statistically not entirely 
well founded (Knol. 1987b). Moreover, the procedure does not 
seem to work satisfactory in practice (Knol. 1987b). Another, 
potentially more promising procedure, comes from quasi- 
loglinear modeling (Bishop. Fienberg & Holland. 1975; 
Kelderman, 1987). in which specific hypotheses can be tested. 
Kelderman (1984) showed that the Rasch model can be written 
as a quasi-loglinear model. This offers the possibility to 
detect specific violations of items to the Rasch model. 

In this paper, a log-likeiihood ratio test based upon a 
quasi-loglinear formulation of the Rasch model will be 
presented in which the conditional Rasch model (Fischer. 
1974) is tested against an alternative model which 
incorporates violations to the Rasch model of a particular 
item. A stepwise top-down procedure based upon this log- 
likelihood ratio test will be given, in which one item at a 
time is deleted from a relatively large initial item set. 
Also, a bottom-up algorithm will be given in which stepwise 
one item at a time is added to a relatively small initial 
item set already satisfying the Rasch model . In order to 
evaluate both algorithms, the procedures will be applied to 
some generated data sets . 
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The Rasch Model as a Quasi-loglinear Model 



In loglinear models, the logarithms of expected cell 
frequencies or counts m are explained in terms of linear 
combinations of functionj of observable categorical 
variables. A subclass of loglinear models arises in the case 
of a priori or structurally zero cells. These models are 
called quasi-loglinear models (cf. Bishop, Fienberg & 
Holland, 1975, Ch. 5). Kelderman (1984, p. 226) showed that 
the conditional Rasch model (Fischer, 1974> can be written as 
a quasi-loglinear model. For our purpose it is assumed that 
the Rasch model contains no subgroups based upon external 
information such as sex or age. The only subgroups we deal 
with are score groups. In the analysis of variance or u-terms 
parametrization (Bishop, Fienberg & Holland, 1975) the 
logarithms of the expected cell counts m for the conditional 
Rasch model (without external subgroups) can be written as 

k 

(1) In m = u + { u.;(xO} + U]^+i(t) . 

Xi...X)^t j = l ^ ^ 

where u is a constant term, (^j ) is the main effect of 

response xj (xj=0,l) of item j (j = l k) and U()^+i)(t) is 

the main effect of score t (t=0 k) . 

The number of estimable parameters of a quasi-loglinear 
model is equal to the difference between the number of 
parameters and the nucnber of constraints imposed by the 
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model. The number of estimable parameters can be obtained 
niimerically by computing the lank of t*ie so-<:alled design 
matrix (cf. Bock, 1975, p. 523) of the quasi-loglinear model. 
An alternative procedure which can be applied for relatively 
simple models such as (1). consists of counting the number of 
estimable parameters by correcting for the constraints. This 
approach will be followed in the present paper. Following the 
procedure of Kelderman (1984. pp. 231-232) the constant u 
term counts as one parameter. Furthermore, each term Uj(xj) 

(j=:l,...,k) counts as one and U(j^^i)(t) (t=0 k) as 

k+l-l=k parameters. Finally, we have the constraint Ej Xj=t. 
Adding the numbers yields the number of estimable parameters 
of model (1) as l+k+k-l=2k. Model (1) can be tested against 
the fully saturated model by the log-likelihood ratio test or 
by Pearson's goodness of fit test (Kelderman. 1984). Botn 
test statistics are asymptotically distributed with 

degrees of freedom equal to the difference between the number 
of structurally nonzero cells and the number of estimable 
parameters of model (1). However, both test statistics wi?l 
not be used throughout the pnper, since the test stat-istics 
are only asymptotically distributed and the number of 
degrees of freedom is 2^2k, which is large already for 
moderate values of k. Instead, model (1) will be tested 

against a model which al lows for each item i ( i= 1 k ) 

separately all first-order interaction terms containing the 
item response x^^ . 

For each i (i=l k) . the model with all first-order 

interaction terms containing x^ is 
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(2) in m = u + {.E ujCxj)) + uj^+^Ct) 

xi-.-x^t j=l 

+ {^^^ ^ijC^CiXj}} + Ui(k+i)(Xit) 

where u^jCxiXj) is the (first-order) interaction 

between responses Xi and Xj of items i and j and ) (x^t ) 

is the (first-order) interaction between response x^ of item 
i and score t. An interaction term Uj[j(Xj[Xj) can be 
interpreted as a measure of local dependence between 
responses and Xj (Kelderman, 1984, p. 224). Hence, the sum 
UijCx^Xj) (j;*i) is a measure of local dependence between 
item i and the remaining items. The interaction term 
^i(k+l)(^i^J can be interpre\;ed as a measure of invariance of 
the item response function of item i over score groups 
(Kelderman, 1984. p. 224). In model (2) u-terms are 
incorporated which reflect both violation of 
unidimensionality of item i and violations of local 
independence of that item with the remaining items. 

It can be proved that model (2) is separable (cf. 
Bishop. Fienberg & Holland. Ch. 5) and that the log- 
likelihood of model (2) equals the sum of the log-likelihoods 
for model ( 1) , computed separately for the data sets with 
Xi=0 and x^^l. It is easily seen that the number of estimable 
parameters of model (2) equals twice the number of estimable 
parameters of model (1) for k-1 items, i.e. 2 (2(k~l ) ] =4k-4 . 
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Since computation time for model (2) is large compared to 
model (1). it is more efficient to compute the log-likelihood 
of model (2) as the sum of the log-likelihoods of modftl (1) 
computed separately for the two data sets with x^=0 and Xi=l. 

For each item i (i=l k) . the Rasch model (1) can be 

tested against model (2) by the log-likelihood ratio test 

(3) g2 = -2(Li - L2(i)]= -2[Ki - KaCi)] . 

where Li and L2(i) denote the log-likelihoods of models (1) 
and (2). respectively, whereas Kj and K2(i) are the kernels 
(Kelderman & Steen. 1988) of the log-likelihoods of models 
(1) and (2). 'respectively. Under the assumption of model (1). 
the test statistic G^^ asymptotically distributed with 
degrees of freedom equal to the difference between the 
numbers of estimable parameters of model (2) and model (1). 
i.e. (4)c-4)-2k=2k-4. Since computation of the log-likelihood 
is often impossible and very expensive for larger values of 
k. (3) will be obtained by computation of the kernels - 

In the next section, two algorithms for constructing a 
Rasch scale will be presented based upon the log-likelihood 
ratio test (3) . 
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Two Algorithms 

Analogcus to the algorithm of Verhelst (1983). a top- 
down algorithm can be constructed in which stepwise ono icem 
at a time is deleted from an initial set of items. The 
computations will be done with the program LOGIMO (Kelderman 
& Steen. 1988) . 

Step 1. Start viith an initial item set S consisting of (say) 
k items. 

Step 2. Run the program for model (1) for the item set S. 

Compute for each item ieS (1=1 k) the G^^ test 

statistic (3). In our implementation, this involves 
running the program LOGIMO 2k+l times. Select the 
item i* with the largest G^^ value. 

Step 3. Compute for the selected item i* the p-value of the 
test statistic G^'^. If p<.05 then model (1) is 
rejected in favour of model ( 2 ) . This means that 
unidimensionality of item i* and/or local 
independence of that item with the remaining items is 
violated. If p<.05 then delete item i* from the item 
set S (i.e. ujxiate S) and repeat steps 2 and 3 ^^ntil 
no item from set S can be deleted any more. 

Step 4. Evaluate the constructed scale with Andersen's (1973) 
log-likelihood ratio test and with the Martin-L6f 
(1973) test. 
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It is also possible to define a bottom-up algorithm (cf . 
Verhelst, 1983), in which stepwise an item is added to a 
(small) set of items already satisfying the Rasch model. A 
bottom-up algorithm based upon the log-likelihood ratio test 
(3) is statistically more appropriate than the top-down 
algorithm, because the test statistic (3) is only ^ 
distributed when the null-hypothesis (i.e. model (1)) is true 
(cf. Verhelst, 1983). For the top-down algorithm the 
assumption that the Rasch model (1) holds, can hardly be 
made. However, for the bottom-up algorithm this assumption 
can be made, provided that it is possible to select a (small) 
initial item set that satisfies the Rasch model. 

A bottom-up algorithm based upon the log-likelihood 
ratio test (3) can be stated as: 



St.ep 1. Start with an initial set S' of k' items that already 
satisfies the Rasch model and a non-overlapping set C 
of n items containing the items that jan potentially 
be added to the Rasch scale. 

Step 2. Compute for each item ieC (i=l n) the test 

statistic (3) for the k'+l items of the set S' + {i}. 
In our implementation, this involves running the 
program LOGIMO 3n times. Select the item i* with the 
smallest G^^ value. 

Step 3. Compute for the selected item i* the p-value of G^^. 

If p> . 05 then model ( 1 ) cannot be rejected . This 
means that the set of items S+{i*} satisfies the 
Rasch model. If p>.05 then add item i* to the item 
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set S (i.e. update S' and C) and repeat steps 2 and 3 
until no item from set C can be added any more. 
Step 4. Evaluate the constructed scale with Andersen's (1973) 
log— likelihood ratio test and with the Martin-L6f 
(1973) test. 



For each iteration cycle of the bottom-up algorithm, the 
program has to be run 3n times whereas the number of runs for 
the top-down algoritm is only 2k:+l . However, since the 
cardinality k' of the start set S' of the bottom-up algorithm 
is typically much smaller (especially during the first 
iteration cycles) than the cardinality k of the start set S 
of the top-down algorithm, it can be expected that CPU-time 
for the bottom-up algorithm will be less than that for the 
top-down algorithm. 

In the next section the performances of the top-down and 
bottom-up algorithms will be evaluated empirically using some 
generated data sets. 



A Simulation Study 

Data have been generated according to the two— parameter 
multidimensional logistic model (Reckase, 1973). In this 
model, the item response function for item i is given by 



(4) p.(e) = {1 + exp[-1.6(a^e - p^)]} 



-1 
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where ai=(aii. . . »ajnj) ' is a vector of item discrimination 
parameters, m is the dimensionality of the ability space, 

is the item difficulty parameter and e=(ei Q^) is the m- 

dimensional vector of abilities. Note that model (4) allows 
for items with different dimensionality. For example, with 
(4) items can be generated which have ii nonzero 
discrimination parameter on one dimension and zeroes on the 
remaining dimensions. Note also that (4) reduces to the Rasch 
model if m=l and <x^=(x (constant) for all icetns i. Model (4) 
allows for items violating the Rasch model in the sense of 
different slopes (discrimination) and different 
dimensionality (violating the local independence). In Table 1 
the item parameters of four generated u.^ldimensional data 
sets are given, and in Table 2 the item parameters of two 
generated two-dimensional data sets . 



Insert Tables 1 and 2 about here 



The data sets are constructed such that the items 1 through 
10 of each data set form a Rasch scale with discrimination 

parameters aj[=a=l (i=l 10). In all data sets the items 11 

through 15 differ in discrimination (sets 1 through 4 and 6) 
and/or dimensionality (sets 5 and 6) from the (dominant) 
Ra. scale . 
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Abilities were randomly sampled from a multivariate 
normal (O.I) distribution. The sample size was chosen to be 
1000, which was expected to be large enough to get 
sufficiently reliable results. It is likely that the repeated 
use of the test (3) in the algorithms will result in chance 
capitalization. For each data set a second, independent data 
set with sample size lUOO v^as generated to evaluate the final 
scales found by the algorithms. 

Both the top>-down and the bottom-up algorithm were 
applied to all data sets. The top-down algorithm starts with 
the item set S. consisting of all the 15 items, whereas the 
bottom-up algorithm has the startset S* . consisting of the 
items 4 through 7. and the remaining items form the set C of 
candidate items. 

Since in model (2) only first-order interaction terms 
have been Incorporated and no overall goodness of fit test is 
available for model (2) because of the too large niimber of 
degrees of freedom, it is necessary to evaluate the item 
selection procedure by an external criterion. In the program 
PML (Molenaar. 1981) the goodness of fit test of Martin- 
Lof (1973) and the log-likelihood ratio test of Andersen 
(1973) are implemented. Both tests were used to evaluate the 
obtained Rasch scale after the selection procedures. 

From Table 1 it can be seen that data set 1 contains, 
besides the dominating Pasch scale (items 1 through 10) a 
subscale consisting of the items 11 through 15 which have 
higher discrimiaaVion parameters • (a=1.4) than the 
discrimination parameters of the dominating scale (a=l). Data 
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set 2 contains a subscale consisting of relatively low 
discriminations (a=.6). The results of the algorithms for the 
data sets 1 and 2 are given in Tables 3 and 4 . respectively. 



Insert Tables 3 and 4 about here 



In order to give more insight in the item selection 
procedures, the p-values of Martin-Lof's and Andersen's 
log-likelihood ratio test are given after each added or 
deleted item. Additionally, as a baseline the p-values of 
both tests are given for the theoretically expected scale(s). 
Also, the p-values of both tests are given for the cross- 
validated scales. From Tables 3 and 4 it is seen that the 
top-down algorithm clearly fails to detect the dominating 
scale. In both cases, the obtained scale consists of a 
mixture of items of the dominating scale and the subscale. 
The outcomes for the data sets 1 and 2 of the bottotrv-up 
algorithm are better: the obtained scales contain all items 
of the dominating scale and (only) two items of the subscale. 
In a sense, the bottom-up algorithm iterates too long. After 
k=9. the algorithm starts to select items from the subscale. 

Contrary to the first two data sets, the data sets 3 and 
4 contain no substantial subscale. The difference between 
data set 3 and 4 is that the discrimination parameters of the 
former are more extreme than those of the latter. The results 
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of the algorithms for data sets 3 and 4 are given in Tables 5 
and 6, respectively. 



Insert Tables 5 and 6 about here 



From Table 5 it can be seen that the top-down algorithm for 
data set 3 yields the dominating scale. The top-down 
algorithm applied to data set 4, however, yields a mixture of 
items from the dominating scale and three of the remaining 
items. Obviously, the algorithm cannot differentiate well 
between items of the dcminating scale with discrimination 
parameter a=l and items with a slightly different 
discrimination parameter (a=:.6 or a=1.2). However, the 
resulting scale has good overall goodness of fit values. As 
in the case of data sets 1 and 2, the bottum-up algorithm for 
data sets 3 and 4 iterates too long. After an optimum has 
been reached (k=9 for data set 3 and k=10 for data set 4), 
the algorithm still adds items whereas it would be better if 
the algorithm had been stopped. 

The data sets 5 and 6 are two-dimensional, where the 
items 11 through 15 meas ire another trait than the items 1 
through 10 do. Data set 5 contains a Rasch subscale 
consisting of the items 11 through 15 whereas ' ta set 6 do^s 
not . The results of the algorithms for data sets 5 and 6 are 
given in the Tables 7 and 8, respectively. 
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Insert Tables 7 and 8 about here 



Prom the Tables 7 and 8 it can be seen that in both cases the 
top-down algorithm clearly fails to yield the dominating 
Rasch scale. For data set 5, the top-down algorithm could not 
discriminate well between the two subscales. Firstly, the 
items 3, 4. 5, 7 and 8 of the dominating scale with moderate 
difficulty parameters; are deleted. Secondly, all items of the 
subscale (items 11 through 15) are deleted. However, the 
resulting scale has good overall goodness of fit values and 
contains only items of the dominating scale. Nevertheless the 
outcome is not satisfactory because the scale consists of 
only five items. In both cases the bottom-up algorithm 
performs well, yielding in both cases the dominating scale 
consisting of the items 1 through 10. 

Summarized, the top-down algorithm yields the complete 
dominating scale in only in one case (data set 3). The 
performance of the bottom-up algorithm is more encouraging. 
For the data sets 5 and 6 where the items 11 through 15 
measure another trait than the trait measured by the items of 
the dominating scale, the algorithm yields precisely the 
dominating scale. For the other data sets where the items 11 
through 15 only differ in discrimination parameters from the 
Items of the dominating scale, the resulting scale consists 
of all items of the dominating scale (except for data set 4) 
and only one or two items of the second subscale. For data 
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set 4. the items : and 2 do not belong to the resulting 
scale. 

Discussion 

From the results presented in the last section, it is 
clear that the bottom-up algorithm is more promising than the 
top-Hiown algorithm. Moreover, the bottom-up algorithm is 
statistically better justified than the top-down algorithm 
(cf. Verhelst, 1983). Finally, the bottom-up algorithm is 
typically faster than the top-down algorithm because the 
former starts with a smaller initial item set. However, it 
has to be noted that CPU-times for the bottom-up algorithm 
are still very large. 

Furthermore, in the presented simulation study the 
bottom-up algorithm starts with an initial item set already 
forming a Rasch scale. Of course, in practice we do not hc'^e 
this knowledge. Without a priori knowledge, it seems very 
difficult to select a small item set that satisfies the Rasch 
model . 

In the simulation study it seems that the bottom-up 
algorithm iterates too long: after cycle 4 or 5 the algorithm 
starts to select items from the subscale. An improvement 
could be to increase the significance level of the log- 
likelihood ratio test (3). Another, probably better 
possibility is ^:o alternate the bottom-up algorithm with one 
or two iterations of the top-down algorithm. This allows the 
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procedure to reject items that havo been added incorrectly to 
the scale in a previous step. An additional advantage of such 
a mixed procedure would be that the choice of the startset is 
less critical. 

In thp alternative model (2). only first-order 
interaction terms have been incorporated. A possible 
explanation of the rather disappointing outcomes of both 
algorithms can be that higher-order interaction terms are 
needed to describe the (induced) violations against the Rasch 
model. However, incorporating higher-order interactions in 
the alternative model will make the algorithms much more 
expensive and. even worse, it will be impossible to run the 
top-down algorithm for large item sets. Finally, it has to be 
noted that because of the repeated use of the test (3). it is 
likely that chance capitalization occurs. Therefore, with 
real data the final scales have to be cross-validated in an 
independent sample. 
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Table 1 

Itemparameters of the four generated unidimensional data sets 
1 through 4. 



set 



item 




1 






2 








3 








4 






a 




a 
P 


cx 




P 








B 

r 




a 








1 


1 




-1.8 








8 






-1. 


8 






-1. 


8 


2 


4 

1 




1 A 








4 






-1 . 


4 






-1. 


4 


3 


1 




1 n 






_1 . 


0 






-1 . 


0 






~1. 


0 


4 


1 




• o 






-0 . 


(f 






-0. 


6 






-0. 


6 


5 


1 










-0 . 


2 






-0 


2 






-0. 


2 


6 


1 




0.2 


1 




0 


2 


1 




0 


2 


1 




0 


2 


7 


1 




0.6 






0 


6 






0 


6 






0 


6 


8 


1 




1.0 






1 


.0 






1 


.0 






1 


.0 


9 


1 




1.4 






1 


.4 






1 


.4 






1 


.4 


10 


1 




1.8 






1 


.8 






1 


.8 






1 


.8 


11 


1 


.4 


-2.0 


9 


6 


-2 


.0 


0 


6 


-1 


.0 


0 


8 


-1 


.0 


12 


1 


.4 


-1.0 


0 


.6 


-1 


.0 


0 


6 


1 


.0 


0 


8 


1 


.0 


13 


1 


.4 


0.0 


T 


.6 


0 


.0 


1 


4 


-1 


.0 


1 


.2 


-1 


.0 


]4 


1 


.4 


l.f 






1 


.0 


1 


.4 


0 


.0 


1 


.2 


0 


.0 


IS 


1 


.4 


2.. 




6 


2 


.0 


1 


.4 


1 


.0 


1 


.2 


1 


.0 
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Table 2 

Itemparameters of the two generated two-dimensional data s^ts 
5 and 6. 



set 



item 




5 








6 







"2 






oi 


«2 


P 


1 


1 


0 


-1. 


8 


1 


0 


-1.8 


2 




0 


-1 


4 




0 


-1.4 


3 




0 


-1 


0 




0 


-1.0 


4 




0 


-0 


6 




0 


-0.6 


5 


1 


0 


-0 


2 




0 


-0.2 


6 




0 


0 


2 




0 


0.2 


7 




0 


0 


6 




0 


0.6 


8 




0 


1 


0 




0 


1.0 


9 




0 


1 


4 




0 


1.4 


10 




0 


1 


8 




0 


1.8 


11 


0 




-2 


.0 


0 


0.6 


-1.0 


12 


0 




-1 


0 


0 


0.6 


1.0 


13 


0 




0 


0 


0 


1.4 


-1.0 


14 


0 




1 


0 


0 


1.4 


0.0 


15 


0 




2 


0 


0 


1.4 


1.0 
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Table 3 

P-values of the Martin-Lof ^ and Andersen's log-likelihood 
ratio test of the top-<iown and bottom-up algorithms for data 
set 1 . 







added/ 


test 


statistic 






deleted 






algorithm scale 


k 


item 


v2 


T D 


baseline 1-10 


10 




.36 


.31 


11-15 


- 




. 66 


* 


top-down 1—1 5 


15 




.27 


.00 




14 


-6 


,13 


.00 




13 


■ -5 


,23 


.01 




12 


-7 


.47 


.00 


1-4.8-11.13-15 


11 


-12 


.91 




cross— validation 


11 




.00 


.12 


bottom-up 4—7 


4 




.02 


.00 




5 


+3 


.05 


.96 




6 


+10 


.40 


. 10 




7 


+2 


.89 


.13 




8 


+8 


.80 


.04 




9 


+1 


.79 


.61 




10 


+15 


.84 


.00 




11 


+12 


.09 


.00 


1-10.12.15 


12 


+9 


.03 


.00 


cros s— val idat ion 


12 




.00 


.14 


* Andersen's LR test cannot 


be 


computed . 
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Table 4 

P-values of the Martin-Ldf y} and Andersen's log-likelihood 

ratio te^n the top-down and bottom-up algorithms for data 
set 2. 



added/ test statistic 

deleted 

algorithm scale k item ^ 



baseline 1—10 


10 




.61 


. 83 


11-15 


5 




. 70 


96 


top-down l-l 5 


15 




.00 


.00 




14 


-2 


. 00 


.00 




13 


-5 


.01 


. 01 




12 


-7 


. 04 


. GO 




11 


—4 


. 24 


on 




10 


-6 


.34 




1,8-15 


9 


-3 


.87 




cross-validation 


9 




.10 


.03 


bottom-up 4-7 


4 




.99 


.96 




5 


+9 


.95 


.81 




6 


+10 


.95 


.73 




7 


+3 


.92 


.63 




8 


+1 


.95 


.68 




9 


+2 


.88 


.45 




10 


+15 


.93 


.77 




11 


+8 


.81 


.24 


1-11,15 


12 


+11 


.89 


.09 


cross— validation 


12 




.14 


.01 



* Andersen's LR test cannot be computed. 
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Table 5 

P-values of the Martin-L5f -f} and Andersen's log-likelihood 
ratio test of the top-down and bottom-up algorithms for data 
set 3. 







added/ 


test 


statistic 






deleted 






algor:lthm scale 


K 


Item 


v2 
Z 


TT? 


baseline 1-10 


10 




.94 


.46 


top-down 1—1 5 


15 




.00 


. 00 




14 


—14 


.00 


. 00 




13 


—12 


.01 


. 00 




12 


—13 


.06 


. 03 




11 


—11 


.69 


. 64 


1-10 


10 


—15 


. 94 


. 40 


cross-validation 


10 




cry 




bottom-up 4-7 


4 




.78 


.91 




5 


+8 


.75 


.91 




6 


+3 


.95 


.98 




7 


+1 


.99 


.83 




8 


+2 


.99 


.50 




9 


+10 


.99 


.73 




10 


+14 


.35 


.57 


1-10.14 


11 


+9 


.18 


.29 


cross-validation 


11 




.33 


.05 



pp. 
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Table 6 

P-values of the Martin-L6f ^ and Andersen's log-likelihood 
ratio test of the top-down and bottom-up algorithms for data 
set 4. 







added/ 


test 


statistic 






deleted 






algoritha scale 


k 


item 


X2 


LR 


baseline 1—10 


10 




.48 


.67 


top-down 1—1 5 


15 




.16 


.06 




14 


-7 


.17 


.01 




13 


-3 


.25 


.02 




12 


-15 


.17 


,06 




11 


-12 


.28 


.06 




10 


-4 


.24 


.07 


1,2,5,8-11,13,14 


9 


-6 


.87 


.85 


cross-validation 


9 




.88 


. 14 


bottom-up 4—7 


4 




.14 


.82 




5 


+9 


.38 


.95 




6 


+11 


.92 


.95 




7 


+3 


.78 


.78 




8 


+15 


.71 


.39 




9 


+8 


.66 


.33 




10 


+10 


.75 


.78 


3-11,13,15 


11 


+13 


.31 


.02 


cross— validation 


11 




.64 


.63 
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Table 7 

P-values of the Martin-Lof and Andersen's log-likelihood 
ratio test of the top-down and bottom-up algorithms for data 
set 5 . 



algorithm scale 


k 


added/ 
deleted 
item 


test 


statistic 
LR 


baseline 1-10 


10 




.49 


.42 


11-15 


5 




.27 




top-down 1—1 5 


15 




.00 


.00 




14 


-8 


. 00 


. UU 




13 


-4 


.00 


.00 




12 


-7 


.00 


.00 




11 


—3 


.04 


. UU 

f 




10 


-5 


.04 


.00 




9 


-12 


.09 


.00 




8 


-13 


.00 


.00 




7 


-14 


.00 


.00 




6 


-15 


.00 


.00 


1.2.6.9.10 


5 


-11 


. 87 


.68 


cross-validation 


5 




.75 


* 


bottom-up 4-7 


4 




.46 


.29 




5 


+3 


.95 


.55 




6 


+8 


.76 


.44 




7 


+1 


.78 


.44 




8 


+9 


.35 


.52 




9 


+10 


.22 


.24 


1-10 


10 


+2 


.49 


.42 


cross-validation 


10 




.75 


.60 



* Andersen's LR test cannot be computed. 
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Table 8 

P-values of the Martin-LSf and Andersen's log-likelihood 
ratio test of the top-down and bottom-up algorithms for data 
set 6. 



algorithm scale 


k 


added/ 
deleted 
item 


test 
Z2 


statistic 
LR 


baseline 1—10 


10 




.31 


.70 


top-down 1—1 5 


15 




.00 


.00 




14 


-15 


.00 


.00 




13 


-14 


.00 


.00 




12 


-7 


.00 


.00 




11 


-4 


.00 


.00 




10 


-6 


.00 


.00 




9 


-5 


.00 


.00 




8 


-3 


.00 


.05 




7 


-9 


.67 


.40 




6 


-2 


.01 


.35 




5 


-13 


.00 


.08 




4 


-8 


.31 


.87 


1,11,12 


3 


-10 


.09 


. 10 


cro s s-val idat ion 


3 




.63 


.56 


bottom-up 4—7 


4 




.65 


.73 




5 


+8 


.63 


.69 




6 


+2 


. 53 


.54 




7 


+1 


.41 


.29 




8 


+10 


.26 


.37 




9 


+9 


.49 


.53 


1-10 


10 


+3 


.31 


.70 


cr o s s-va 1 idat ion 


10 




.77 


.63 
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